Fast and Accurate Protein False Discovery Rates on Large-Scale Proteomics Data Sets with Percolator 3.0
نویسندگان
چکیده
Percolator is a widely used software tool that increases yield in shotgun proteomics experiments and assigns reliable statistical confidence measures, such as q values and posterior error probabilities, to peptides and peptide-spectrum matches (PSMs) from such experiments. Percolator's processing speed has been sufficient for typical data sets consisting of hundreds of thousands of PSMs. With our new scalable approach, we can now also analyze millions of PSMs in a matter of minutes on a commodity computer. Furthermore, with the increasing awareness for the need for reliable statistics on the protein level, we compared several easy-to-understand protein inference methods and implemented the best-performing method-grouping proteins by their corresponding sets of theoretical peptides and then considering only the best-scoring peptide for each protein-in the Percolator package. We used Percolator 3.0 to analyze the data from a recent study of the draft human proteome containing 25 million spectra (PM:24870542). The source code and Ubuntu, Windows, MacOS, and Fedora binary packages are available from http://percolator.ms/ under an Apache 2.0 license. Graphical Abstract ᅟ.
منابع مشابه
iProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates.
The combination of tandem mass spectrometry and sequence database searching is the method of choice for the identification of peptides and the mapping of proteomes. Over the last several years, the volume of data generated in proteomic studies has increased dramatically, which challenges the computational approaches previously developed for these data. Furthermore, a multitude of search engines...
متن کاملPepDistiller: A quality control tool to improve the sensitivity and accuracy of peptide identifications in shotgun proteomics.
In this study, we presented a quality control tool named PepDistiller to facilitate the validation of MASCOT search results. By including the number of tryptic termini, and integrating a refined false discovery rate (FDR) calculation method, we demonstrated the improved sensitivity of peptide identifications obtained from semitryptic search results. Based on the analysis of a complex data set, ...
متن کاملStatistical and machine learning methods to analyze large-scale mass spectrometry data
As in many other fields, biology is faced with enormous amounts of data that contains valuable information that is yet to be extracted. The field of proteomics, the study of proteins, has the luxury of having large repositories containing data from tandem mass-spectrometry experiments, readily accessible for everyone who is interested. At the same time, there is still a lot to discover about pr...
متن کاملA statistical model for identifying proteins by tandem mass spectrometry.
A statistical model is presented for computing probabilities that proteins are present in a sample on the basis of peptides assigned to tandem mass (MS/MS) spectra acquired from a proteolytic digest of the sample. Peptides that correspond to more than a single protein in the sequence database are apportioned among all corresponding proteins, and a minimal protein list sufficient to account for ...
متن کاملProteomics Applications in Health: Biomarker and Drug Discovery and Food Industry
Advancing in genome sequencing has greatly propelled the understanding of the living world, however, it is insufficient for full description of a biological system. Focusing on, proteomics has emerged as another large-scale platform for improving the understanding of biology. Proteomic experiments can be used for different aspects of clinical and health sciences such as food technology, biomark...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 27 شماره
صفحات -
تاریخ انتشار 2016